Rational structural genomics: affirmative action for ORFans and the growth in our structural knowledge.

نویسنده

  • D Fischer
چکیده

Daniel Fischer1 1997). When using all the structures available in October 1998, the fraction of assigned proteins reached 32% (see http: Faculty of Natural Science, Department of Mathematics and Computer //www.doe-mbi.ucla.edu/people/frsvr/preds/MG/MG.html). Science, Ben Gurion University, Beer-Sheva 84015, Israel This indicates that because of the availability of more struc1To whom correspondence should be addressed; email: [email protected] tures, the fraction of assignable ORFs has increased at an The determination of the complete genome sequences of annual rate of roughly 18% (Fischer and Eisenberg, 1999a; organisms is producing an avalanche of protein sequences see also Teichmann et al., 1999 and references therein). awaiting further structural and functional interpretation. Only Will the rate of increase in fold assignment be sustained a small fraction of the proteins encoded in these genomes has throughout the next few years? To address this question, I been experimentally studied, but putative functions for roughly have analyzed the distribution of the fold assignments of 70% of the ORFs can be assigned via homology with characterM.genitalium among the various functional categories ized proteins in the databases. Similarly, although only a very described by Fraser et al. (1995). Table I shows that the three small number of structures have been determined for these categories with the largest percentages of folds assigned are proteins, putative three-dimensional (3D) structures can curpurine metabolism, energy metabolism and translation-tRNA. rently be assigned to roughly 30% of the ORFs using fold For example, all but two ORFs in the first category have been assignment computational methods. Here I address the followassigned a fold. As expected, and mostly due to the difficulties ing questions. How fast is our structural knowledge growing? in determining the structures of membrane proteins, the three What is the distribution of assigned folds in the different least covered categories are cell envelope, unknown and functional categories? How might structure determination transport. The last column in Table I shows that the largest efforts be prioritized for maximum information and impact? number of non-membrane proteins with no assigned fold I have analyzed the 3D fold assignments for the genome of belong in the unknown and ribosomal categories (ORFs Mycoplasma genitalium (Fraser et al., 1995), which due to its characterized as membranal or with putative transmembrane small size has served as a minimal model organism for various helices were excluded). studies. Several publications have reported different fractions The fraction of assignable ORFs will undoubtedly continue of the genome for which 3D folds can be assigned. The earliest to grow in the next few years, because new structures will works reported fractions as low as 9 and 12% (Casari et al., continue to be determined in most of the functional categories. 1996; Frishman and Mewes, 1997; Gerstein, 1997). Later However, because in several functional categories only a few works using methods aimed at detecting more distant relationORFs lack structural assignments, if structure determination ships have increased this fraction to 25% (Fischer and continues to concentrate on the best represented categories, Eisenberg, 1997), and more recently, up to around 40% the fraction of assignable ORFs will soon reach a plateau. (Huynen et al., 1998; Rychlewski et al., 1988; Teichmann A ‘rational’ approach to structural genomics (Fischer and et al., 1998; Jones, 1999; Wolf et al., 1999 and others; for Eisenberg, 1997; Kim, 1997; Rost, 1998; Teichmann et al., recent reviews on this topic see Fischer and Eisenberg, 1999a; 1999) could significantly advance our knowledge by selecting Teichmann et al., 1999). The differences in the reported for structural determination studies those proteins in the fractions depend mainly on (i) the methods’ sensitivities (the categories with fewer assigned structures. 153 ORFs (32% of rate of true positives) and their selectivities (the rate of false the genome) belong to the unknown category, of which 97 positives); (ii) whether assignments are accounted for full [21% of the genome or 43% (97/224) of the unassigned ORFs] structural domain matches or for only small sequence–structure correspond to soluble proteins with no functional or structural segments and (iii) the date that the study was done (which information whatsoever. Roughly half of them match proteins determines the number of known sequences and structures and of unknown function from other organisms, indicating that hence the number of sequences that can be assigned to they are conserved proteins in various organisms. The other known folds). half of the ORFs in the ‘unknown’ category show no sequence To evaluate how much the increase in the fraction of similarity to any protein of other organisms (excluding the assignable ORFs depends on the number of available folds, I close relative M.pneumonia). If these orphan ORFs, or ORFans have compared the fold assignment of M.genitalium proteins for short (Fischer and Eisenberg, 1999b), code for expressed obtained by one particular method using three different sets proteins (Dujon et al., 1994; Goffeau et al., 1996), they will of structures. The method used in this comparison (Fischer correspond to unique proteins with novel functions or to very and Eisenberg, 1997) is aimed at detecting full structural distant members of known families. Thus, these ORFans are domain matches and uses rather conservative thresholds (the likely to be among the most interesting targets for further method chosen to carry out this comparison is irrelevant; structural and functional studies. Characterizing ORFans qualitatively similar results are likely to be obtained with any (Fischer and Eisenberg, 1999b) and conserved proteins of other method). When using only those structures available unknown function (Zarembinski et al., 1998) will be essential before 1996 only 20% of the genome could be assigned a to fully understand the genetic material. Knowing their strucfold. With structures from the PDB available in April 1997, tures will considerably contribute to our understanding of protein structure, function and evolution. 25% of the genome was assigned a fold (Fischer and Eisenberg,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structural biology sheds light on the puzzle of genomic ORFans.

Genomic ORFans are orphan open reading frames (ORFs) with no significant sequence similarity to other ORFs. ORFans comprise 20-30% of the ORFs of most completely sequenced genomes. Because nothing can be learnt about ORFans via sequence homology, the functions and evolutionary origins of ORFans remain a mystery. Furthermore, because relatively few ORFans have been experimentally characterized, ...

متن کامل

Interpretive Structural Modeling of Barriers to Knowledge Commercialization

Purpose: The present research aimed to develop an interpretive structural model of the barriers to knowledge commercialization (KC) in Payame Noor University (PNU) of Iran.Method: The present research is an applied research of mixed method types in terms of objectives and it is conducted based on confirmatory factor analysis. Fuzzy Delphi method was used for validation and variable screen...

متن کامل

Creep Life Forecasting of Weldment

One of the yet unresolved engineering problems is forecasting the creep lives of weldment in a pragmatic way with sufficient accuracy. There are number of obstacles to circumvent including: complex material behavior, lack of accurate knowledge about the creep material behavior specially about the heat affected zones (HAZ),accurate and multi-axial creep damage models, etc. In general, creep life...

متن کامل

The Politics of Researching Global Health Politics; Comment on “Knowledge, Moral Claims and the Exercise of Power in Global Health”

In this comment, I build on Shiffman’s call for the global health community to more deeply investigate structural and productive power. I highlight two challenges we must grapple with as social scientists carrying out the types of investigation that Shiffman proposes: the politics of  challenging the powerful; and the need to investigate types of expertise that have traditionally been thought o...

متن کامل

Unravelling the ORFan Puzzle

ORFans are open reading frames (ORFs) with no detectable sequence similarity to any other sequence in the databases. Each newly sequenced genome contains a significant number of ORFans. Therefore, ORFans entail interesting evolutionary puzzles. However, little can be learned about them using bioinformatics tools, and their study seems to have been underemphasized. Here we present some of the qu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Protein engineering

دوره 12 12  شماره 

صفحات  -

تاریخ انتشار 1999